Training Robust Acoustic Models Using Features of Pseudo-Speakers Generated by Inverse CMLLR Transformations
نویسندگان
چکیده
In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers’ features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.
منابع مشابه
Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve th...
متن کاملAcoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identi...
متن کاملEmotion recognition using linear transformations in combination with video
The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incre...
متن کاملAcoustic Model Identification Using Inverse Model
Sound measured at various points around the environment can be evaluated by a series of multi-pole sources and their acoustic strength can be acquired. In this numerical study, a method, called the inverse method, was examined to achieve this goal. A variety of arrangements of different sources were considered and the acoustic strength of these sources was acquired. Through the application of t...
متن کاملRapid unsupervised speaker adaptation robust in reverberant environment conditions
We expand the conventional rapid adaptation based on Nclosest speakers sufficient statistics (suff stat) to achieve robustness under reverberant conditions. We integrated our fast dereverberation technique based on optimized multi-band spectral subtraction as pre-processing. This removes the late reflection components of the reverberant signal effectively and fast. Speakers’ suff stat are then ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011